44 research outputs found

    Incremental Learning in Diagonal Linear Networks

    Full text link
    Diagonal linear networks (DLNs) are a toy simplification of artificial neural networks; they consist in a quadratic reparametrization of linear regression inducing a sparse implicit regularization. In this paper, we describe the trajectory of the gradient flow of DLNs in the limit of small initialization. We show that incremental learning is effectively performed in the limit: coordinates are successively activated, while the iterate is the minimizer of the loss constrained to have support on the active coordinates only. This shows that the sparse implicit regularization of DLNs decreases with time. This work is restricted to the underparametrized regime with anti-correlated features for technical reasons

    Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

    Get PDF
    Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi \& Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap

    Leveraging the two timescale regime to demonstrate convergence of neural networks

    Full text link
    We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to hold, distinguishing our result from popular recent approaches such as the neural tangent kernel or mean-field regimes. Experimental illustration is provided, showing that the stochastic gradient descent behaves according to our description of the gradient flow and thus converges to a global optimum in the two-timescale regime, but can fail outside of this regime.Comment: 33 pages, 7 figure

    Graph-based Approximate Message Passing Iterations

    Full text link
    Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument ? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph admit rigorous SE equations, extending the reach of previous proofs, and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.Comment: 59 pages, 24 main, 35 appendi

    Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

    Get PDF
    Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi & Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap

    Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

    Get PDF
    International audienceIn the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y=⟹ξ∗,X⟩Y = \langle \theta_*, X \rangle between the random output YY and the random feature vector Ί(U)\Phi(U), a potentially non-linear transformation of the inputs UU. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum ξ∗\theta_* and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum ξ∗\theta_* and of the feature vectors Ί(u)\Phi(u). We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension

    A Continuized View on Nesterov Acceleration

    Get PDF
    We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters

    Massive Nest-Box Supplementation Boosts Fecundity, Survival and Even Immigration without Altering Mating and Reproductive Behaviour in a Rapidly Recovered Bird Population

    Get PDF
    Habitat restoration measures may result in artificially high breeding density, for instance when nest-boxes saturate the environment, which can negatively impact species' demography. Potential risks include changes in mating and reproductive behaviour such as increased extra-pair paternity, conspecific brood parasitism, and polygyny. Under particular cicumstances, these mechanisms may disrupt reproduction, with populations dragged into an extinction vortex. With the use of nuclear microsatellite markers, we investigated the occurrence of these potentially negative effects in a recovered population of a rare secondary cavity-nesting farmland bird of Central Europe, the hoopoe (Upupa epops). High intensity farming in the study area has resulted in a total eradication of cavity trees, depriving hoopoes from breeding sites. An intensive nest-box campaign rectified this problem, resulting in a spectacular population recovery within a few years only. There was some concern, however, that the new, high artificially-induced breeding density might alter hoopoe mating and reproductive behaviour. As the species underwent a serious demographic bottleneck in the 1970–1990s, we also used the microsatellite markers to reconstitute the demo-genetic history of the population, looking in particular for signs of genetic erosion. We found i) a low occurrence of extra-pair paternity, polygyny and conspecific brood parasitism, ii) a high level of neutral genetic diversity (mean number of alleles and expected heterozygosity per locus: 13.8 and 83%, respectively) and, iii) evidence for genetic connectivity through recent immigration of individuals from well differentiated populations. The recent increase in breeding density did thus not induce so far any noticeable detrimental changes in mating and reproductive behaviour. The demographic bottleneck undergone by the population in the 1970s-1990s was furthermore not accompanied by any significant drop in neutral genetic diversity. Finally, genetic data converged with a concomitant demographic study to evidence that immigration strongly contributed to local population recovery
    corecore